Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models
نویسندگان
چکیده
This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, morphological segmentation) while learning a morpheme segmentation over the target language. Our model outperforms a competitive word alignment system in alignment quality. Used in a monolingual morphological segmentation setting it substantially improves accuracy over previous state-of-the-art models on three Arabic and Hebrew datasets.
منابع مشابه
Unsupervised segmentation of hidden semi-Markov non-stationary chains
In the classical hidden Markov chain (HMC) model we have a hidden chain X , which is a Markov one and an observed chain Y . HMC are widely used; however, in some situations they have to be replaced by the more general “hidden semi-Markov chains” (HSMC), which are particular “triplet Markov chains” (TMC) ) , , ( Y U X T = , where the auxiliary chain U models the semi-Markovianity of X . Otherwis...
متن کاملA Phrase-Based Hidden Semi-Markov Approach to Machine Translation
Statistically estimated phrase-based models promised to further the state-of-the-art, however, several works reported a performance decrease with respect to heuristically estimated phrase-based models. In this work we present a latent variable phrase-based translation model inspired by the hidden semi-Markov models, that does not degrade the system. Experimental results report an improvement ov...
متن کاملUnsupervised segmentation of randomly switching data hidden with non-Gaussian correlated noise
Hidden Markov chains (HMC) are a very powerful tool in hidden data restoration and are currently used to solve a wide range of problems. However, when these data are not stationary, estimating the parameters, which are required for unsupervised processing, poses a problem. Moreover, taking into account correlated non-Gaussian noise is difficult without model approximations. The aim of this pape...
متن کاملSegmenting Continuous Motions with Hidden Semi-markov Models and Gaussian Processes
Humans divide perceived continuous information into segments to facilitate recognition. For example, humans can segment speech waves into recognizable morphemes. Analogously, continuous motions are segmented into recognizable unit actions. People can divide continuous information into segments without using explicit segment points. This capacity for unsupervised segmentation is also useful for ...
متن کاملUnsupervised Segmentation of Phoneme Sequences based on Pitman-Yor Semi-Markov Model using Phoneme Length Context
Unsupervised segmentation of phoneme sequences is an essential process to obtain unknown words during spoken dialogues. In this segmentation, an input phoneme sequence without delimiters is converted into segmented sub-sequences corresponding to words. The Pitman-Yor semi-Markov model (PYSMM) is promising for this problem, but its performance degrades when it is applied to phonemelevel word seg...
متن کامل